[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running.#8153
[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running.#8153rohitagarwal003 wants to merge 4 commits into
Conversation
…are already running.
|
ok to test @vanzin |
There was a problem hiding this comment.
you can avoid all mutability with
val tasks: Seq[Future[_]] = logInfos.grouped(20).map{ batch =>
replayExecutor.submit(new Runnable {
override def run(): Unit = mergeApplicationListing(batch)
})
}(I know the change to grouped is unrelated, but everytime I look at this code it confuses me for a second why we have overlapping sliding windows -- might as well as clean it up while you are messing around here)
There was a problem hiding this comment.
Thanks! I am a Scala noob - I did it the Java way. :-)
Your suggestion looks much cleaner - I have updated the PR.
|
minor style comment, otherwise makes sense to me |
|
Test build #40716 has finished for PR 8153 at commit
|
There was a problem hiding this comment.
Since we're talking about cleaning this up, you could do this also:
logInfos.grouped(20)
.map { batch =>
replayExecutor.submit(new Runnable {
override def run(): Unit = mergeApplicationListing(batch)
})
}
.foreach { task =>
// Wait for all tasks to finish. This makes sure that checkForLogs is
// not scheduled again while some tasks are already running in the
// replayExecutor.
try {
task.get()
} catch {
case e: InterruptedException =>
throw e
case e: Exception =>
logWarning("Error replaying logs.", e)
}
}
Note I added some missing exception handling, which would cause you to revert to the old behavior of piling up executions if an error happened.
There was a problem hiding this comment.
Thanks! I have updated the PR to add exception handling. I have modified the log message because errors while replaying event logs are already caught in the mergeApplicationListing method.
|
In a way I think the underlying issue is more a problem with the aggressive default polling interval (10 seconds?). But this is a way to make it better. I think in the (not so distant) future we should investigate using the recently added inotify-like API in HDFS, to see whether it helps us avoid polling altogether. |
|
Test build #40782 has finished for PR 8153 at commit
|
|
Jenkins, retest this please |
|
Test build #40805 timed out for PR 8153 at commit |
…behavior of piling up executions.
|
Test build #40857 has finished for PR 8153 at commit
|
|
Jenkins, retest this please |
|
LGTM. |
|
Test build #40900 timed out for PR 8153 at commit |
|
retest this please |
|
Test build #40933 has finished for PR 8153 at commit
|
|
Can we retest this please? |
|
Test build #1625 timed out for PR 8153 at commit |
|
retest this please |
|
Test build #40975 timed out for PR 8153 at commit |
|
As usual, flaky tests in other unrelated modules. I'll just give up on jenkins and merge this Monday morning. |
|
Merged to master, thanks! |
…are already running. Author: Rohit Agarwal <rohita@qubole.com> Closes apache#8153 from mindprince/SPARK-9924.
No description provided.